Chunker and Shallow Parser for Free Word Order Languages: An Approach based on Valency Theory and Feature Structures
نویسندگان
چکیده
Free word order languages have relatively unrestricted local word group or phrase structures that make the problem of chunking quite challenging. On the other hand, a robust chunker can drastically reduce the complexity of a parser that follows. We present here a computational framework for chunking of free word order languages based on a generalization of the valency theory. Every word has certain valency constraints that allow it to form local word groups with the adjacent words. The groups so formed have their own valency constraints and thus can recursively participate in the grouping process. This is implemented through feature structure unification. Sentence level constraints on the groups help the chunker reject invalid groupings and detect errors in POS tags provided by a POS-tagger that precedes the chunker. Based on this approach, a Bengali chunker has been implemented with a reasonably good accuracy. The paper also describes how this method can be generalized to develop a complete parser for free word order languages by incorporating semantic information as well as probabilistic models.
منابع مشابه
An Affinity Based Greedy Approach towards Chunking for Indian Languages
A robust chunker can drastically reduce the complexity of parsing of natural language text. Chunking for Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. A computational framework for chunking based on valency theory and feature structures has been described here. The paper also draws an analogy of chunk formation in free word ...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملRobust and efficient semantic parsing of free word order languages in spoken dialogue systems
This paper presents a semantic parser for spoken dialogue systems. The parser is designed especially for the analysis of free word order languages by providing a feature called orderindependent matching. We describe how this feature allows writing of rules for free word order languages in an elegant way (using German as example language) and how it increases the robustness against speech recogn...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل